929 research outputs found

    Information content-based gene ontology semantic similarity approaches: toward a unified framework theory

    Get PDF
    Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term’s specificity in the GO DAG

    Information content-based gene ontology functional similarity measures: which one to use for a given biological data type?

    Get PDF
    The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration

    Modelling the risk of airborne infectious disease using exhaled air

    Get PDF
    AbstractIn this paper we develop and demonstrate a flexible mathematical model that predicts the risk of airborne infectious diseases, such as tuberculosis under steady state and non-steady state conditions by monitoring exhaled air by infectors in a confined space. In the development of this model, we used the rebreathed air accumulation rate concept to directly determine the average volume fraction of exhaled air in a given space. From a biological point of view, exhaled air by infectors contains airborne infectious particles that cause airborne infectious diseases such as tuberculosis in confined spaces. Since not all infectious particles can reach the target infection site, we took into account that the infectious particles that commence the infection are determined by respiratory deposition fraction, which is the probability of each infectious particle reaching the target infection site of the respiratory tracts and causing infection. Furthermore, we compute the quantity of carbon dioxide as a marker of exhaled air, which can be inhaled in the room with high likelihood of causing airborne infectious disease given the presence of infectors. We demonstrated mathematically and schematically the correlation between TB transmission probability and airborne infectious particle generation rate, ventilation rate, average volume fraction of exhaled air, TB prevalence and duration of exposure to infectors in a confined space

    Generation and Analysis of Large-Scale Data-Driven Mycobacterium tuberculosis Functional Networks for Drug Target Identification

    Get PDF
    Technological developments in large-scale biological experiments, coupled with bioinformatics tools, have opened the doors to computational approaches for the global analysis of whole genomes. This has provided the opportunity to look at genes within their context in the cell. The integration of vast amounts of data generated by these technologies provides a strategy for identifying potential drug targets within microbial pathogens, the causative agents of infectious diseases. As proteins are druggable targets, functional interaction networks between proteins are used to identify proteins essential to the survival, growth, and virulence of these microbial pathogens. Here we have integrated functional genomics data to generate functional interaction networks between Mycobacterium tuberculosis proteins and carried out computational analyses to dissect the functional interaction network produced for identifying drug targets using network topological properties. This study has provided the opportunity to expand the range of potential drug targets and to move towards optimal target-based strategies

    Designing with Data, Democratization through data

    Get PDF
    corecore